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Coding systems have become popular methods of 
cataloging the verbal and nonverbal interaction occurring during 
marital and family therapy. One such system, Pinsof's (1981) Family 
Therapist Coding System (FTCS), was the first designed explicitly to 
identify and differentiate specific verbal behaviors of family 
therapists independent of their theoretical orientation. To test the 
system's interrator and intrarater reliability, data were coded from 
typed manuscripts of six audio-taped marital therapy sessions. Coders 
were two undergraduate students trained for about seven hours each. 
The code consisted of three categories (for verb, phrase, and speech 
clause); the codes are ranked hierarchically such that only one code 
is assigned to each of the nine scales within the categories. The 
results indicated low observer agreement for overall session 
reliability and for category reliability. The low reliability did not 
appear to be due to observer drift or actual therapy sessions, but to 
the expertise and experience of the coders. The primary contributor 
to low reliability appeared to be the individual codes. Most of the 
codes with lower percentage of agreement values appeared to be less 
clearly defined and more difficult to apply to the data. The FTCS 
does not appear to be a reliable or practical assessment tool for 
determining the effectiveness of the therapist's statements during 
ongoing marital therapy sessions. (LLL) 
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Introduction 

Coding systems have become popular methods of cataloging the verbal and 
nonverbal Interaction occurring during marital and family therapy. Although 
1t appears that such techniques are necessary for the study of the process 1n 
marital and family Interaction, few replications of research involvlna coding 
systems have been conducted and only a small percentage of these research 
studies have reported reliability statistics. Most research on marriage and 
family coding systems have reported overall Interrater reliability in terms of 
percentage of agreement. Few of the studies gave specific, detailed 
descriptions of the sampled behaviors or the scoring unit, and failed to state 
which codes were reliable and which were not. 

In order to assess the usefulness and accuracy of therapy coding systems, 
thorough studies must be conducted to determine the reliability of their 
application since conclusions cannot be drawn nor hypotheses tested until 
these coding systems are found to be both reliable and valid. 

Purpose 

The purpose of this study was to test the reliability of an application 
of Pinsof s (1981) Family Therapist Coding System (FTCS) 1n actual ongoing 
marital therapy sessions. In planning this study, the decision was made to 
analyze the therapist's statements during the therapy process. The 
effectiveness of specific therapist statements are critical to the therapy 
process and behavior change 1n the clients. The decision to use the FTCS was 
based on the belief that the FTCS is the most complex and reliable system 
developed thus far to describe p. Ineraplst's Interaction 1n marital and family 
therapy. 
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The Family Therapist Coding System 

The Family Therapist Coding System (FTCS) was the first coding system 
designed explicitly to Identify and differentiate specific verbal behaviors of 
family therapists Independent of their theoretical orientation. The FTCS 
consists of 9 nominal scales each of which contains numerous qualitatively 
distinct categories and sub-categories. In addition, the therapist's verbal 
behavior 1s coded within the context of the therapy Interaction; that 1s, 
client statements can be used to clarify the therapist's statement. The FTCS 
is applied to written transcripts of therapy sessions and therefore, allows 
unitization prior to the coding process. 

Statisti cs 

While the most commonly used statistic for non-pa rametrlc data 1s 
percentage of agreement, this measure has several problems: (1) 1t does not 
take Into account the chance occurrence of agreement, thereby resulting 1n a 
high reliability estimate; (2) percentage of agreement does not have metric 
properties and therefore, comparisons with other statistical measures are not 
possible; (3) percentage scores do not provide Information about the sources 
of measurement error (e.g., errors of commission vs. errors of omission); (4) 
since percentage of agreement varies with the size of the time/event Interval 
used, percentage of agreement scores are unrealistic when the rate of behavior 
is either very low or very high. (5) Finally, 1t 1s difficult to put 
percentage of agreement differences 1n perspective without knowledge of within 
subject variability. In the present study, 1n addition to adjusted percentage 
of agreement, Cohen's Kappa was applied to the data 1n order to enhance our 
understanding and knowledge of the coding scale's reliability. 

Cohen's Kappa provides a superior statistic for reliability of 
non-pa rametrlc data since 1t accounts for the frequency with which coders use 



each category 1n the scale and also, the extent to which a score differs from 
chance (Hollenbeck, 1978). Further, Kappa has a number of advantages for use 
with nominal scale data; one, 1t 1s easy to compute and two, 1t 1s valuable 1n 
training coders since 1t allows one to see each category and determine where 
the disagreements and agreements occur, thus, making differences 1n scoring 
easy to detect. Finally, Cohen's Kappa, unlike percentage of agreement, has 
metric properties which permit comparison of the results. One disadvantage of 
Cohen's Kappa 1s that the derived value decreases with Increased amounts of 
data; thus, the Important results may tend to be suppressed. 

Questions 

Is the Family Therapist Coding System reliable 1n terms of session reliability 
and category reliability? 

1) Session Reliability 

a) What effects do the sequence of therapy sessions, the Individual 
differences 1n coders, the experience and expertise of the coders, coder 
drift, and the order 1n which the therapy sessions were coded have on 
Interrater reliability? 

b) What effects do the sequence of therapy sessions, the order 1n which the 
therapy sessions were coded, and coder drift have on Intrarater 
reliability? 

2) Category Reliability 

a) What effects do the three primary categories have on the reliability? 

b) What effects do the matches and non-matches of the individual codes have 
on the reliability? 
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Method 

Two undergraduate students and the researcher coded data from typed 
manuscripts of audio-taped therapy sessions. Six sessions were coded for 
Interrater reliability. These sessions were chosen to sample the range of the 
therapy sequence and were randomly assigned to the coding sequence. Prior to 
coding, each verb, phrase and speech clause was delineated, so that each of 
the coders would code each clause within the correct category. The codes are 
ranked heirarchical ly such that only one code 1s assigned to each of the 9 
scales within the 3 categories. In order to obtain a measure of intrarater 
reliability, one of the coders recoded two-thirds of five previously coded 
sessions, 1n the same order as before. The two coders were trained for 
approximately seven hours each. Each coder was checked for accuracy against 
the researcher several times throughout the coding process. 

Results 

The data were analyzed by means of a computer program designed to 
calculate Cohen's Kappa and adjusted percentage of agreement for the overall 
session and individual category error pate (CRESCAT: Software for real-time 
analysis, 1981). In addition, the CRESCAT program designated the percentage 
of error for each category as well as the specification and location of the 
disagreement. Analyses were run for each coding pair for each of the therapy 
sessions. 

The results indicate that 1n terms of the sequence of therapy sessions 
and the order 1n which the sessions were coded, there 1s little change in 
interobserver agreement and that the agreement obtained 1s not more than what 
would be expected by chance alone. In terms of all three coding categories, 
interobserver agreement was below .66 (Percentage of Agreement) and .24 
(Cohen's Kappa). 
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The retest-rel lability analyses Indicate some Increase 1n rellab Hty 
over time, according to the order 1n which the sessions were coded. Also, the 
retest-rel lability scores are higher than would be expected by chance alone, 
for the speech clause category and for the phrase category, after the first 
two sessions. 

Looking at the Individual codes, Table 1 shows the number of matches and 
non-matches plus the percentage of agreement for each code within the phrase 
category. The phrase category 1s divided Into seven units; only one code from 
each scale 1s applied to each phrase clause. Only seven codes had above 502 
agreement; the remainder of the scores were quite low. 

Table 2 shows the Individual code statistics for the verb category. 
Table 3 shows the Individual code statistics for the speech clause category. 
One code I (Isolate) reached .82. The remaining 5 codes were quite low, below 
.20. 

01 scussion 

The results Indicate low observer agreement for both overall session 
reliability and category reliability. The low reliability does not appear to 
be due to observer drift or the actual therapy sessions. However, the 
expertise and experience of the coders did appear to affect the reliability 
results. 

The primary contributer to the low reliability appears to be the 
Individual codes; only nine codes within the three categories received 
agreement above 50%. The verb codes show the lowest reliability results, with 
only two codes showing percentage of agreement above 50%. The phrase category 
shows somewhat higher reliability results, 1n 5 of the 7 scales. This may be 
due, in part, to the fact that most of the scales contain fewer codes, 1n 
comparison to the verb category. The low reliability results shown in the 
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Intervention scale way be due to the effects of coder expertise. That Is, 
only the researcher was knowledgeable of therapist Intervention techniques, 
prior to the beginning of the study. In addition, the Intervention scale did 
not adequately represent or depict the range and specificity of the 
Interventions contained 1n the sessions coded for this study. 

The speech clause category, al<*->e, shows a slightly higher rate of 
agreement that expected by chance as well as a slight tendency to Increase 1n 
reliability across sessions. This way be due to the fact that the speech 
clause category contains much fewer codes than the other 2 categories. Also, 
the speech clause codes are applied to much less data. 

The codes that show higher percentage of agreement values 1n all three 
categories appear to be more easily differentiated from the other codes. 
Most of the codes with lower percentage of agreement values appeared to be 
less clearly defined and more difficult to apply to the data. 

Implications 

In conclusion, the FTCS does not appear to be a reliable or practical 
assessment tool for determining the effectiveness of the therapist's 
statements during ongoing marital therapy sessions. The Intervention codes, 
which would appear to be the most useful scale 1n this regard, did not have 
any percentage of agreement scores above 38%. The time Involved in training 
the coders and in preparing the data for coding limit the practicality of the 
FTCS. Future studies should have each coder apply a different coding scale, 
which has already been shown to be reliable, to dummy tapes, prior to 
beginning the actual study, to insure that the low reliability is not due to 
the Individual coders. 
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TABLE 1 



MATCHES, NON-MATCHES, AND PERCENTAGE OF AGREEMENT 
FOR EACH 'PHRASE' CODE CATEGORY ** 



CODE 


# 


KATCHB 


N0N~MATCHES 


PERCENTAGE OF AGREEMENT 


1. DO 


1 


00 


05 


.00 


SO 


2 


58 


56 


.30 


K 


3 


02 


02 


.25 


DN 


4 


28 


78 


.38 


TR 


5 


00 


04 


.00 


EM 


6 


52 


173 


.19 


S 


7 


162 


204 


.36 


BR 


8 


00 


10 


.00 


EN 


9 


00 


08 


.00 


PR 


10 


00 


16 


.00 


C 


11 


140 


212 


.31 


PM 


12 


00 


23 


.00 


ST 


13 


36 


150 


.14 


2. N 


14 


158 


274 


.43 


CR 


15 


410 


316 


.55 


F 


16 


18 


34 


.34 


P 


17 


60 


102 


.20 


AT 


18 


00 


15 


.00 


3. CT 


19 


*• mm 


mm mm 




FM 


20 






mm mm 


PC 


21 


mm mm 


mm mm 


mm mm 


CP 


22 


42 


88 


.24 


WM 


23 


462 


163 


.70 


HF 


24 


468 


188 


.70 


C+ 


25 


mm mm 


mm mm 


mmmm 


CI 


26 




mm mm 


mm ** 


C2 


27 


m m* 




mmmm 


NS 


28 


mm mm 


mm mm 


m»mm 


4. G 


29 


-m mm 


mm mm 




TC 


30 


76 


126 


.29 


DC 


31 


182 


224 


.39 


MC 


32 


252 


228 


.49 




33 


262 


370 


.42 


** M01 


Phrase 


Clauses 







1. Intervention scale 3. To Whom scale 

2. Temporal Orientation scale 4. Interpersonal Structure scale 
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TABLE 1 (CONTINUED) 

MATCHES, NON-MATCHES, AND PERCENTAGE OF AGREEMENT 
FOR EACH 'PHRASE* CODE CATEGORY ** 



— 1* KM 1 

CODE 


# 


HATCHES 


NON-MATCHES 


PERCENTAGE OF AGREEMENT 


Tar—— 
> INC 


HT 

34 ' 


00 


38 


.00 


TY 


35 






— 


CG 


36 


00 


09 


.00 


TFO 


37 


00 


03 


.00 


CFO 


38 


00 


09 


.00 


MFQ 


39 


04 


01 


.80 


PFO 


40 


00 


08 


.00 


SFO 


41 


00 


15 


.00 


EFO 


42 


00 


01 


.00 


NFO 


43 








TfF 


44 


14 


m m A 

114 


.07 


CNF 


45 


/>/> 

00 


07 


.00 


MNF 


46 


440 


216 


.63 


PNF 


47 


00 


06 


.00 


SNF 


48 


00 


08 


.00 


ENF 


49 


02 


02 


.33 


NNF 


en 
50 


04 


42 


.04 


OTF 


51 


04 


29 


.08 


OT 


52 


02 


29 


.07 


OF 


53 


02 


20 


.03 


0 


54 


00 


10 


.00 


IT 


55 


00 


06 


.00 


D! 


56 


02 


37 


.01 


I 


57 


00 


03 


.00 


D 


58 


947 


69 


.94 


CD 


59 


04 


23 


.12 


QO 


60 


134 


131 


.47 


QC 


61 


12c 


149 


.46 


L 


62 


670 


166 


.77 


AA ! 


63 


00 ! 


25 1 


.00 


INCL ! 


64 


00 ! 


06 1 


.00 



** M01 Phrase Clauses 



5. System Membership scale 7. Grammatical Form scale 

6. Route scale 8. Event Relationship scale 
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MATCHES, NON-MATCHES, AND PERCENTAGE OF AGREEMENT 
FOR EACH 1 VERB 1 CODE CATEGORY ** 





w 


If a True c 


IfUN-fvMlrntS 


PEKCtNTABt Or AGREEMENT 


CON 


1 " 


68 


125 


.32 


PE 


2 


16 


60 


.09 


NE 


3 


56 


96 


.36 


NSE 


4 


74 


88 


.36 


PB 


5 


160 


328 


.23 


NB 


6 


46 


162 


.14 


SB 


7 


00 


06 


.00 


NVB 


8 


00 


51 


.00 


VB 


9 


426 


300 


.51 


NSB 


10 


500 


723 


.36 


PC 


11 


00 


62 


.00 


NVC 


12 


00 


11 


.00 


NLC 


13 


1058 


751 


.54 


SP 


14 


00 


26 


.00 


EX 


15 


02 


16 


.00 


F 


16 


00 


29 


.00 


INCL 


17 


00 


02 


.00 



** V01 Verb Clauses 



TABLE 3 

MATCHES, NON-MATCHES, AND PERCENTAGE OF AGREEMENT 
FOR EACH 'SPEECH CLAUSE' CODE CATEGORY 



CODE 


I 


MATCHES 


NON-MATCHES 


PERCENTAGE OF AGREEMENT 


CV 


1 1 


00 


10 


.00 


FN 


2 


16 


61 


.08 


IN 


3 


00 


04 


.00 


T 


4 




mm mm 




M 


5 


66 


157 


.18 


IS 


6 


820 


190 


.82 


INCL 


7 


06 


38 


.15 
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